COVID-19, which is short for coronavirus disease of 2019, is the illness caused by the SARS-CoV-2 virus first identified in Wuhan, China in December of 2019. Since then, the virus has rapidly spread across the world, leading the World Health Organization to declare a global pandemic. Millions of Americans have been infected by the virus, and hundreds of thousands have died due to the disease with those numbers only continuing to grow each day. A global race to develop a vaccine in record-breaking time ensued, with over 100 different candidates being tested across the globe. Despite multiple vaccines receiving emergency authorizations from multiple different nations, the situation is worsening daily as new mutant strains are being identified such as those identified in the United Kingdom. In the United States, public health officials are struggling to convince the populous that the vaccines are safe and effective, leading to widespread anti-vaccine protests seeking to slow the vaccination efforts, which only lends itself to give the virus more time to develop a mutation to defeat the current vaccine formulations.
Thus, analyzing data related to COVID-19 is worthwhile since it will help people understand the overall situation and severity of the pandemic and arouse their interest in adopting protective measures like mask-wearing, social-distancing, and vaccination. In addition, analyzing this data may expose differences in the ability of different regulations between states to contain the virus, which may prove beneficial in helping state governments are only utilizing restrictions that truly work to contain this pathogen.
The COVID-19 Data Repository by the Center for System Science and Engineering (CSSE) at Johns Hopkins University is compiled from sources such as, but not limited to, the World Health Organization and the United States Centers for Disease Control and Prevention (a list of all data sources is provided in the README.md file of the repository) provides case and deaths counts for each state/U.S. territory for each day since the SARS-CoV-2 virus was first detected in Washington state in January of 2020. This data set has been known to provide some of the most up-to-date information possible, which has resulted in many different organizations citing this data as trustworthy and reliable.
| UID | iso2 | iso3 | code3 | FIPS | Admin2 | Province_State | Country_Region | Lat | Long_ | Combined_Key | X1.22.20 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 84001001 | US | USA | 840 | 1001 | Autauga | Alabama | US | 32.53953 | -86.64408 | Autauga, Alabama, US | 0 |
| 84001003 | US | USA | 840 | 1003 | Baldwin | Alabama | US | 30.72775 | -87.72207 | Baldwin, Alabama, US | 0 |
| 84001005 | US | USA | 840 | 1005 | Barbour | Alabama | US | 31.86826 | -85.38713 | Barbour, Alabama, US | 0 |
| 84001007 | US | USA | 840 | 1007 | Bibb | Alabama | US | 32.99642 | -87.12511 | Bibb, Alabama, US | 0 |
| 84001009 | US | USA | 840 | 1009 | Blount | Alabama | US | 33.98211 | -86.56791 | Blount, Alabama, US | 0 |
| 84001011 | US | USA | 840 | 1011 | Bullock | Alabama | US | 32.10031 | -85.71266 | Bullock, Alabama, US | 0 |
| UID | iso2 | iso3 | code3 | FIPS | Admin2 | Province_State | Country_Region | Lat | Long_ | Combined_Key | Population |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 84001001 | US | USA | 840 | 1001 | Autauga | Alabama | US | 32.53953 | -86.64408 | Autauga, Alabama, US | 55869 |
| 84001003 | US | USA | 840 | 1003 | Baldwin | Alabama | US | 30.72775 | -87.72207 | Baldwin, Alabama, US | 223234 |
| 84001005 | US | USA | 840 | 1005 | Barbour | Alabama | US | 31.86826 | -85.38713 | Barbour, Alabama, US | 24686 |
| 84001007 | US | USA | 840 | 1007 | Bibb | Alabama | US | 32.99642 | -87.12511 | Bibb, Alabama, US | 22394 |
| 84001009 | US | USA | 840 | 1009 | Blount | Alabama | US | 33.98211 | -86.56791 | Blount, Alabama, US | 57826 |
| 84001011 | US | USA | 840 | 1011 | Bullock | Alabama | US | 32.10031 | -85.71266 | Bullock, Alabama, US | 10101 |
Admin2: name of county/political subdivision of U.S. state/territoryProvince_State: name of U.S. state/territoryXmm.dd.yy: one feature per day since the SARS_CoV_2 virus was first detected in the United States representing the case/death count of the county/political subdivision definied by the Admin2 feature; takes the format of Xmm.dd.yy where mm is the one- or two-digit month as a decimal, dd is the one- or two-digit day of the month as a decimal, and yy is the two-digit year without century as a decimalThe Homeland Infrastructure Foundation-Level Data Hospitals (HIFLD Hospitals) data set published by the United States Department of Homeland Security and compiled from sources from the United States Department of Health & Human Services and Centers for Disease Control and Prevention provides a list of all hospitals in the United States and their associated trauma level. It identifies how many hospitals and of what type exist in each state.
| NAME | STATE | TYPE | BEDS | TRAUMA |
|---|---|---|---|---|
| CENTRAL VALLEY GENERAL HOSPITAL | CA | GENERAL ACUTE CARE | 49 | NA |
| LOS ROBLES HOSPITAL & MEDICAL CENTER - EAST CAMPUS | CA | GENERAL ACUTE CARE | 62 | NA |
| EAST LOS ANGELES DOCTORS HOSPITAL | CA | GENERAL ACUTE CARE | 127 | NA |
| SOUTHERN CALIFORNIA HOSPITAL AT HOLLYWOOD | CA | GENERAL ACUTE CARE | 100 | NA |
| KINDRED HOSPITAL BALDWIN PARK | CA | GENERAL ACUTE CARE | 95 | NA |
| LAKEWOOD REGIONAL MEDICAL CENTER | CA | GENERAL ACUTE CARE | 172 | NA |
STATE: two-letter U.S.P.S. abbreviation of state nameTYPE: type of hospital; value can be "GENERAL ACUTE CARE", "CRITICAL ACCESS", "PSYCHIATRIC", "LONG TERM CARE", "REHABILITATION", "MILITARY", "SPECIAL", "CHILDREN", "WOMEN", or "CHRONIC DISEASE"STATUS: current status of hospital; value either "OPEN" or "CLOSED"LATITUDE: latitude of hospitalLONGITUDE: longitude of hospitalBEDS: number of beds available at hospital; value of -999 represents an unknown count of bedsTRAUMA: non-standard trauma center level identifier (definitions can be found in the HIFLD Trauma Levels Data Set); value of "NOT AVAILABLE" indicates the hospital is not classified as a trauma centerThe NYT Mask-Wearing Survey data set contains estimates of mask-usage from 250,000 survey responses for each county in the US. Each participant was asked “How often do you wear a mask in public when you expect to be within six feet of another person?” and given the choices of never, rarely, sometimes, frequently, or always. The survey was done in 2020 from July 2 to July 14, and was assembled by The New York Times and Dynata.
| COUNTYFP | NEVER | RARELY | SOMETIMES | FREQUENTLY | ALWAYS |
|---|---|---|---|---|---|
| 1001 | 0.053 | 0.074 | 0.134 | 0.295 | 0.444 |
| 1003 | 0.083 | 0.059 | 0.098 | 0.323 | 0.436 |
| 1005 | 0.067 | 0.121 | 0.120 | 0.201 | 0.491 |
| 1007 | 0.020 | 0.034 | 0.096 | 0.278 | 0.572 |
| 1009 | 0.053 | 0.114 | 0.180 | 0.194 | 0.459 |
| 1011 | 0.031 | 0.040 | 0.144 | 0.286 | 0.500 |
COUNTYFP column is the FIPS code for the county.The COVID-19 Vaccinations in the United States data set contains number of vaccine doses administered by state. Data on COVID-19 vaccine doses administered in the United States are collected by vaccination providers and reported to CDC through multiple sources, including jurisdictions, pharmacies, and federal entities, which use various reporting methods, including Immunization Information Systems, Vaccine Administration Management System, and direct data submission.
| State | Total_Doses_Administered | Doses_Administered_per_100k | X18._Doses_Administered | X18._Doses_Administered_per_100K |
|---|---|---|---|---|
| Alaska | 239927 | 32797 | 238872 | 43308 |
| Alabama | 815108 | 16624 | 814893 | 21361 |
| Arkansas | 540192 | 17900 | 540003 | 23300 |
| American Samoa | 18816 | 33788 | 18600 | 42821 |
| Arizona | 1525794 | 20962 | 1524293 | 27034 |
| Bureau of Prisons | 52743 | NA | 52740 | NA |
Total doses administered column is the total number of vaccine doses that have been given to people.
Doses administered per 100k column is the total number of vaccine doses given for every 100,000 people.
18+ Doses Administered column is the total number of vaccine doses that have been given to people for the overall population
18+ Doses administered per 100k column is the total number of vaccine doses given for every 100,000 people aged 18 years and older.
The Infection rates before and after stay at home orders went into effect set contains a list of each state and the date on which the first stay at home order was put into effect. It also has infection rates for days before and after the enstatement of these orders. Infection rates were calculated using daily COVID-19 daily cases collected by Johns Hopkins Center for Health Security.
| State | Order.date | Infection.rate.and.confidence.interval..before.order. | Infection.rate.and.confidence.interval..after.order. |
|---|---|---|---|
| Alabama | 4/4/20 | 0.099 (0.088, 0.109) | 0.042 (0.039, 0.045) |
| Alaska | 3/28/20 | 0.11 (0.095, 0.126) | 0.03 (0.027, 0.032) |
| Arizona | 3/31/20 | 0.134 (0.124, 0.143) | 0.03 (0.025, 0.036) |
| California | 3/19/20 | 0.084 (0.077, 0.091) | 0.055 (0.05, 0.06) |
| Colorado | 3/26/20 | 0.11 (0.1, 0.121) | 0.04 (0.035, 0.044) |
| Connecticut | 3/23/20 | 0.154 (0.136, 0.172) | 0.065 (0.059, 0.07) |
State column is the state abbreviation for each state where data was available in the U.S.
Order.date column is the date on which the first stay at home order was put into effect.
Infection.rate.and.confidence.interval.before.order column is the infection rate and confidence interval for this rate for the day before the order went into effect
Infection.rate.and.confidence.interval.after.order column is the infection rate and confidence interval of this rate for the day after the order went into effect.
Between 2020-01-22 to 2021-02-24, 2.8336097^{7} total cases of COVID-19 have been detected in the United States and 5.0589^{5} total deaths have been ruled as being caused by COVID-19.
| date | total_cases | total_deaths | |
|---|---|---|---|
| Min. :2020-01-22 | Min. : 1 | Min. : 0 | |
| 1st Qu.:2020-04-30 | 1st Qu.: 1107214 | 1st Qu.: 67774 | |
| Median :2020-08-08 | Median : 5022981 | Median :163216 | |
| Mean :2020-08-08 | Mean : 7786083 | Mean :177519 | |
| 3rd Qu.:2020-11-16 | 3rd Qu.:11337674 | 3rd Qu.:249572 | |
| Max. :2021-02-24 | Max. :28336097 | Max. :505890 |
As seen in the distributions of cases and deaths by state, California and Texas both appear as outliers with higher numbers of both cases and deaths. However, when the population of these states is taken into account, it begins to provide a possible explanation of the higher numbers found in these states. Additionally, the epidemiologic data suggests that mutated variants of the SARS-CoV-2 that are more infectious and transmissible may be to blame for the high number of cases in these states.
As seen in the above visualizations of the geographic distributions of hospitals and trauma centers in the United States, health care institutions tend to be located around population centers. The distributions also show that larger states with larger populations have more hospitals and trauma centers, and are more likely to have lower level trauma centers. Additionally, lower level trauma centers, on average, have more beds for patients that facilities with a higher trauma level.
| BEDS | |
|---|---|
| Min. : 2.0 | |
| 1st Qu.: 29.0 | |
| Median : 89.0 | |
| Mean : 158.8 | |
| 3rd Qu.: 221.8 | |
| Max. :1592.0 | |
| NA’s :180 |
As seen in the box plot, there are quite a few outliers when it comes to the distribution of beds among trauma center levels. This is likely due to the different populations of different regions, as facilities in more highly-populated areas will need more beds for patients than those in rural areas. It is likely that trauma centers are created based not on population, but rather, geographic distance to another facility able to provide the same level of care.
Grouped by counties, an average of 51% of the responses are “Always,” and an average of 8% of the responses are “Never.” For a single county, the values for each response are supposed to sum to one. In reality, the values are rounded to three decimal places, so the sum for each county ranges from 0.998 to 1.002.
| NEVER | RARELY | SOMETIMES | FREQUENTLY | ALWAYS | sum | |
|---|---|---|---|---|---|---|
| Min. :0.00000 | Min. :0.00000 | Min. :0.0010 | Min. :0.0290 | Min. :0.1150 | Min. :0.998 | |
| 1st Qu.:0.03400 | 1st Qu.:0.04000 | 1st Qu.:0.0790 | 1st Qu.:0.1640 | 1st Qu.:0.3932 | 1st Qu.:1.000 | |
| Median :0.06800 | Median :0.07300 | Median :0.1150 | Median :0.2040 | Median :0.4970 | Median :1.000 | |
| Mean :0.07994 | Mean :0.08292 | Mean :0.1213 | Mean :0.2077 | Mean :0.5081 | Mean :1.000 | |
| 3rd Qu.:0.11300 | 3rd Qu.:0.11500 | 3rd Qu.:0.1560 | 3rd Qu.:0.2470 | 3rd Qu.:0.6138 | 3rd Qu.:1.000 | |
| Max. :0.43200 | Max. :0.38400 | Max. :0.4220 | Max. :0.5490 | Max. :0.8890 | Max. :1.002 |
There doesn’t seem to be any significant outliers. This is probably because there were 250,000 survey responses for a survey with only 5 options. Any individual county would have to have a lot of different responses in order to be able to become an outlier. Also, there is less chance for outliers because this data set was grouped into counties, forcing all of the columns for each row to sum to one. There are no NA values, and it seems to have data for almost every county.
By Feb 22th, there are 68150728 people in the US got vaccination. Grouped by states, there are an average of 21242 per 100,000 (21.2415%) of population in the US given doses. The number of doses administered per 100,000 ranges from 11767 to 39499.
| Total_Doses_Administered | Doses_Administered_per_100k | X18._Doses_Administered | X18._Doses_Administered_per_100K | |
|---|---|---|---|---|
| Min. : 7073 | Min. :11767 | Min. : 7073 | Min. :15081 | |
| 1st Qu.: 241471 | 1st Qu.:18891 | 1st Qu.: 240832 | 1st Qu.:24127 | |
| Median : 614928 | Median :19881 | Median : 614420 | Median :25428 | |
| Mean :1097822 | Mean :21231 | Mean :1096961 | Mean :27224 | |
| 3rd Qu.:1396224 | 3rd Qu.:22824 | 3rd Qu.:1395704 | 3rd Qu.:28548 | |
| Max. :7728120 | Max. :39499 | Max. :7724412 | Max. :50641 |
The most significant outlier in the data set is the total vaccination population in California. The possible reason might be overall education level in that states is high and also the population base in CA is large so that there are a great number of people taking the vaccine.
Between ‘r date_range[1]’ and ‘r date_range[2]’ there were ‘r num_orders’ different states which instituted stay at home orders. The average decrease in COVID-19 infection rates due to stay at home orders was ‘r average.infection.rate.change’ with the maximum decrease being ‘r max.infection.rate.change’ and the minimum being ‘r min.infection.rate.change’.
| Order.date | Number.of.days.before.order | Number.of.days.after.order | infection.rate.before.order | infection.rate.after.order | infection.rate.change | |
|---|---|---|---|---|---|---|
| Min. :0020-03-19 | Min. : 4.00 | Min. : 6.00 | Min. :0.0720 | Min. :0.01900 | Min. :0.01800 | |
| 1st Qu.:0020-03-24 | 1st Qu.:12.00 | 1st Qu.:12.00 | 1st Qu.:0.0985 | 1st Qu.:0.03400 | 1st Qu.:0.05050 | |
| Median :0020-03-27 | Median :16.00 | Median :17.00 | Median :0.1100 | Median :0.04200 | Median :0.07000 | |
| Mean :0020-03-27 | Mean :16.37 | Mean :16.33 | Mean :0.1147 | Mean :0.04426 | Mean :0.07049 | |
| 3rd Qu.:0020-04-01 | 3rd Qu.:20.00 | 3rd Qu.:20.00 | 3rd Qu.:0.1240 | 3rd Qu.:0.05600 | 3rd Qu.:0.08650 | |
| Max. :0020-04-07 | Max. :28.00 | Max. :25.00 | Max. :0.1970 | Max. :0.06600 | Max. :0.14300 |
When it came to the values of infection rates before and after stay at home orders were imposed in various states, there were a few outliers in the values for infection rates before states imposed stay at home orders as well as the difference between infection rates before and after orders were imposed. This is probably because of the inconsistency between the amount of time between the two values were taken from as well as population density in those states. The outliers for infection rates before orders were imposed occurred in Alabama, Alaska, Arizona and the values were 0.072, 0.079, 0.081. The outlier for infection rate changes occurred in West Virginia and the value was 0.018.
In every state, there are many different hospitals of many different sizes with many different capabilities. With this knowledge, the question can be asked, does the number of types of hospitals affect the death rate of COVID-19, and if it does, how? In this question, when referring to the “type” of hospital, it references the trauma center level (potentially) assigned to a hospital based on its capabilities to handle trauma patients as defined by the American College of Surgeons. Trauma centers are assigned a level from I to V, with level I trauma centers having the most advanced capabilities and surgeons and specialists available at any time, whereas level V trauma centers are capable of diagnosing and stabilizing trauma patients long enough for them to survive to a lower level trauma center. Additionally, the death rate of COVID-19 is a cause-specific death rate, meaning it measures the frequency of death in a defined population over a specified interval. In this instance, it is measured in deaths per 100,000 members of the population.
To determine this, the number of different types of hospitals in the 50 states and other U.S. territories was compared to the death rate measured in each location since the SARS-CoV-2 virus was first detected in the United States. To do this, the HIFLD Hospitals data was used along with the JHU CSSE COVID-19 Data. The hospital data was filtered to only include relevant hospitals (i.e. general acute care hospitals rather than psychiatric or rehabilitation hospitals), and a standardized ACS trauma level was applied to relevant observations as different states used different methods of denoting trauma levels. This was then aggregated by state to produce a count of the different types of hospitals in each individual state and territory which can be seen in the following table. Data about the death counts over time from the JHU CSSE COVID-19 Data data set was then imported, and the resulting data frame was transformed into transformed into long form to provide a total death count for each state and territory for each day since the virus first appeared in the U.S. The most recent day’s worth of death totals was filtered, which was then combined with the region’s population data so that a death rate could be calculated. This was then joined with the existing hospital counts for each state, which was then plotted on a scatter plot using the type of hospital as a facet. Each plot was then had a linear regression trend line applied to it as shown in the below plots.
| State | Non-trauma Hospitals | Level I | Level II | Level III | Level IV | Level V |
|---|---|---|---|---|---|---|
| AK | 8 | 0 | 3 | 0 | 18 | 0 |
| AL | 51 | 4 | 2 | 52 | 0 | 0 |
| AR | 34 | 1 | 3 | 17 | 35 | 0 |
| AS | 1 | 0 | 0 | 0 | 0 | 0 |
| AZ | 57 | 11 | 0 | 8 | 27 | 0 |
| CA | 392 | 15 | 35 | 13 | 5 | 0 |
| CO | 22 | 3 | 11 | 26 | 32 | 0 |
| CT | 21 | 3 | 7 | 1 | 0 | 0 |
| DC | 6 | 3 | 0 | 0 | 0 | 0 |
| DE | 3 | 1 | 0 | 6 | 0 | 0 |
| FL | 230 | 10 | 23 | 0 | 0 | 0 |
| GA | 137 | 5 | 9 | 8 | 8 | 0 |
| GU | 3 | 0 | 0 | 0 | 0 | 0 |
| HI | 19 | 1 | 1 | 2 | 0 | 0 |
| IA | 15 | 2 | 4 | 19 | 90 | 0 |
| ID | 42 | 0 | 3 | 1 | 0 | 0 |
| IL | 143 | 17 | 40 | 0 | 0 | 0 |
| IN | 116 | 3 | 6 | 13 | 0 | 0 |
| KS | 106 | 2 | 2 | 5 | 35 | 0 |
| KY | 87 | 2 | 1 | 4 | 12 | 0 |
| LA | 172 | 2 | 4 | 3 | 0 | 0 |
| MA | 71 | 11 | 1 | 7 | 0 | 0 |
| MD | 49 | 1 | 4 | 3 | 0 | 0 |
| ME | 31 | 1 | 2 | 0 | 0 | 0 |
| MI | 132 | 8 | 23 | 7 | 0 | 0 |
| MN | 30 | 4 | 5 | 19 | 77 | 0 |
| MO | 70 | 15 | 20 | 27 | 3 | 0 |
| MP | 1 | 0 | 0 | 0 | 0 | 0 |
| MS | 27 | 1 | 3 | 15 | 61 | 0 |
| MT | 25 | 0 | 4 | 4 | 8 | 24 |
| NC | 119 | 6 | 3 | 8 | 0 | 0 |
| ND | 46 | 1 | 5 | 0 | 0 | 0 |
| NE | 49 | 1 | 3 | 5 | 37 | 0 |
| NH | 20 | 1 | 2 | 6 | 1 | 0 |
| NJ | 74 | 4 | 6 | 0 | 0 | 0 |
| NM | 36 | 1 | 0 | 6 | 6 | 0 |
| NV | 50 | 1 | 2 | 2 | 0 | 0 |
| NY | 171 | 20 | 13 | 10 | 0 | 0 |
| OH | 168 | 12 | 10 | 20 | 0 | 0 |
| OK | 30 | 2 | 2 | 26 | 73 | 0 |
| OR | 19 | 2 | 6 | 10 | 27 | 0 |
| PA | 176 | 18 | 12 | 2 | 1 | 0 |
| PR | 61 | 0 | 0 | 0 | 0 | 0 |
| PW | 1 | 0 | 0 | 0 | 0 | 0 |
| RI | 13 | 1 | 0 | 0 | 0 | 0 |
| SC | 79 | 5 | 1 | 1 | 0 | 0 |
| SD | 11 | 0 | 3 | 1 | 7 | 38 |
| TN | 121 | 7 | 2 | 7 | 0 | 0 |
| TX | 268 | 16 | 21 | 54 | 195 | 0 |
| UT | 32 | 2 | 3 | 5 | 11 | 3 |
| VA | 87 | 5 | 7 | 5 | 0 | 0 |
| VI | 2 | 0 | 0 | 0 | 0 | 0 |
| VT | 14 | 1 | 0 | 0 | 0 | 0 |
| WA | 27 | 1 | 8 | 22 | 35 | 14 |
| WI | 50 | 3 | 9 | 33 | 50 | 0 |
| WV | 28 | 2 | 3 | 3 | 24 | 0 |
| WY | 5 | 0 | 2 | 4 | 11 | 9 |
Based on the results of the analysis, it does not appear as though there is a significant relationship between the number or types of hospitals in a state or territory and the death rate from COVID-19. This is evidenced largely through the six faceted scatter plots, as in each plot, the trend line depicted clearly does not depict a significant correlation between the the facility count and death rate. Interestingly enough, contrary to my belief, three of the six trends actually depicted slight increases in the death rate with increases in the number of hospitals, specifically those that are level I, level II, or non-trauma hospitals. Level V trauma centers depicted a slight downwards trend, whereas the level III and level IV centers appearred to have no correlation at all with death rate.
One of the most interesting things that this might suggest is the important of wearing a mask and practicing proper social distancing. As seen in the geographic distribution of trauma centers, lower level (levels I and II) trauma centers and non-trauma hospitals tend to be grouped near population hotspots in urban cities such as Los Angeles, Houston, Chicago, and New York, to name a few. This suggests that the death rate is more concerned with the ability of the virus to spread among individuals, which is the case in these large urban areas. This would support what public health officials have been saying the vast majority of the time, which is that it is so very critical for every, but especially those coming into contact with those outside of their household often, to wash their hands, wear a mask, and keep your distance.
Rate of cases here is the ratio of new cases and total population for each state on the most recent date. Percent of vaccination population is the percentage of population that has given a vaccine.
To do the data analysis, I first calculate the new cases of each state on most recent date. Then I calculate the rate of cases by taking the ratio of new cases and population of each state. Lastly I compare the rate of cases with percent of vaccination to find potential relationship.
| State | Most Recent Ratio of Vaccination Population | Case Rate at Most Recent Date |
|---|---|---|
| New Mexico | 0.29211 | 0.0002119 |
| South Dakota | 0.26816 | 0.0002922 |
| West Virginia | 0.26261 | 0.0001375 |
| North Dakota | 0.25795 | 0.0001493 |
| Connecticut | 0.25044 | 0.0004202 |
| Wyoming | 0.23527 | 0.0000757 |
| Vermont | 0.23091 | 0.0001235 |
| Oklahoma | 0.22911 | 0.0002005 |
| Montana | 0.22738 | 0.0001862 |
| Maine | 0.22275 | 0.0001211 |
| Massachusetts | 0.21764 | 0.0003041 |
| Wisconsin | 0.21634 | 0.0001437 |
| Colorado | 0.21419 | 0.0001982 |
| Arizona | 0.20962 | 0.0001742 |
| Minnesota | 0.20751 | 0.0001321 |
| Virginia | 0.20689 | 0.0002216 |
| Florida | 0.20514 | 0.0003248 |
| Oregon | 0.20508 | 0.0000972 |
| Nebraska | 0.20304 | 0.0001947 |
| North Carolina | 0.20196 | 0.0003127 |
| New Hampshire | 0.20171 | 0.0002441 |
| Rhode Island | 0.19980 | 0.0004296 |
| Michigan | 0.19881 | 0.0001559 |
| New Jersey | 0.19804 | 0.0003515 |
| Washington | 0.19614 | 0.0001118 |
| California | 0.19559 | 0.0001452 |
| Louisiana | 0.19518 | 0.0001895 |
| Iowa | 0.19490 | 0.0002443 |
| Nevada | 0.19428 | 0.0001620 |
| New York | 0.19393 | 0.0003181 |
| Indiana | 0.19293 | 0.0001478 |
| Delaware | 0.19165 | 0.0002807 |
| Ohio | 0.19067 | 0.0001572 |
| Illinois | 0.19066 | 0.0001598 |
| Utah | 0.19060 | 0.0002453 |
| Kentucky | 0.19058 | 0.0002888 |
| Idaho | 0.19050 | 0.0002274 |
| Maryland | 0.18732 | 0.0001421 |
| Pennsylvania | 0.18599 | 0.0002178 |
| South Carolina | 0.18279 | 0.0004040 |
| Missouri | 0.18101 | 0.0001002 |
| Arkansas | 0.17900 | 0.0002647 |
| Kansas | 0.17759 | 0.0003764 |
| Georgia | 0.17711 | 0.0002992 |
| Mississippi | 0.16990 | 0.0002255 |
| Tennessee | 0.16691 | 0.0002349 |
| Alabama | 0.16624 | 0.0002527 |
| Texas | 0.16555 | 0.0002609 |
According to both quantitative and graphical result, it shows that there is a slight correlation between Ratio of Vaccination Population and Case Rate in each state at most recent date.
This result is unexpected to me. I was originally sure of this correlation to be strong. My assumption is as the percentage of vaccination population goes up, the cases rate goes down. The possible reasons for this unexpectency are:
Since the beginning of the COVID-19 pandemic, states have put stay at home orders into place in an effort to curb the spread of the virus. But do these orders really have a substantial effect on the infection rates throughout the U.S?
The way in which my data was collected will make this slightly challenging. Since the values for infection rate were collected at different times before and after the start of lock-down for each state, I will need to use the ratio between the number of days after the start of each state’s lock-down and the date at which the second infection rate value was collected and the number of days between the date of the first infection rate value’s collection as a standardized measurement of the time spent in lock-down. Then, in order to standardize the change in infection rate from the pre-lock-down measurement to the post-lock-down measurement for different amounts of time, I will divide that value by the number of days in between measurements. In order to find a relationship between time spent in a stay at home order and the change in infection rates, I will plot the daily infection rates on the y-axis and the ratio value on the x-axis and insert a line of best fit.
| State | Order.date | ratio.between.days.after.and.before.order | average.daily.infection.rate.decrease |
|---|---|---|---|
| South Carolina | 0020-04-07 | 0.2142857 | 0.0018529 |
| Missouri | 0020-04-06 | 0.3333333 | 0.0030000 |
| Florida | 0020-04-03 | 0.3703704 | 0.0024595 |
| Georgia | 0020-04-03 | 0.3846154 | 0.0019444 |
| Texas | 0020-04-02 | 0.4230769 | 0.0019730 |
| Alabama | 0020-04-04 | 0.4285714 | 0.0019000 |
| Tennessee | 0020-04-02 | 0.5000000 | 0.0026061 |
| Pennsylvania | 0020-04-01 | 0.5217391 | 0.0022571 |
| Mississippi | 0020-04-03 | 0.5263158 | 0.0025172 |
| Nevada | 0020-04-01 | 0.5714286 | 0.0022727 |
| Maine | 0020-04-02 | 0.6111111 | 0.0018276 |
| District of Columbia | 0020-04-01 | 0.7058824 | 0.0016207 |
| Arizona | 0020-03-31 | 0.7222222 | 0.0033548 |
| Maryland | 0020-03-30 | 0.7368421 | 0.0017576 |
| North Carolina | 0020-03-30 | 0.7368421 | 0.0023333 |
| Virginia | 0020-03-30 | 0.7368421 | 0.0014545 |
| California | 0020-03-19 | 0.8928571 | 0.0005472 |
| Washington | 0020-03-23 | 0.9130435 | 0.0015455 |
| Kansas | 0020-03-30 | 0.9333333 | 0.0022759 |
| Rhode Island | 0020-03-28 | 1.0000000 | 0.0007500 |
| Colorado | 0020-03-26 | 1.0588235 | 0.0020000 |
| Minnesota | 0020-03-27 | 1.1333333 | 0.0021250 |
| New York | 0020-03-22 | 1.1578947 | 0.0029024 |
| Massachusetts | 0020-03-24 | 1.1764706 | 0.0011081 |
| Kentucky | 0020-03-26 | 1.2000000 | 0.0016970 |
| Oregon | 0020-03-23 | 1.3125000 | 0.0011351 |
| New Hampshire | 0020-03-27 | 1.4166667 | 0.0013448 |
| Indiana | 0020-03-24 | 1.4285714 | 0.0018235 |
| Montana | 0020-03-28 | 1.4545455 | 0.0031852 |
| Wisconsin | 0020-03-25 | 1.4615385 | 0.0030313 |
| Michigan | 0020-03-24 | 1.6666667 | 0.0044688 |
| New Mexico | 0020-03-24 | 1.6666667 | 0.0010000 |
| Louisiana | 0020-03-23 | 1.7500000 | 0.0029697 |
| Alaska | 0020-03-28 | 1.7777778 | 0.0032000 |
| Vermont | 0020-03-25 | 1.9000000 | 0.0030000 |
| Connecticut | 0020-03-23 | 1.9090909 | 0.0027812 |
| Ohio | 0020-03-23 | 1.9090909 | 0.0030313 |
| Illinois | 0020-03-21 | 1.9166667 | 0.0028286 |
| New Jersey | 0020-03-21 | 1.9166667 | 0.0033143 |
| Hawaii | 0020-03-25 | 2.1111111 | 0.0027500 |
| Delaware | 0020-03-24 | 2.5000000 | 0.0016786 |
| Idaho | 0020-03-25 | 2.7142857 | 0.0016538 |
After examining the data, it is clear that at last in this set, there is a weak relationship between the length of time under a stay at home order and a decrease in COVID-19 infection rates. While Idaho has a large ratio of days in vs out of lock-down, the average daily rate increase was nowhere near the maximum value. This is also true for South Carolina’s data which shows the smallest value for the ratio but a very average value for daily decrease. This does not mean that stay at home orders are not effective, however. In fact, the data shows quite the opposite. The average infection rate before a state’s order was 0.1147 while the average of infection rate data taken after an order went into effect was 0.04426. This is substantial difference and shows that stay at home orders are very useful all over the country.
These weak relationship between time and effectiveness was most likely due to the limited dataset. This dataset had only one measurement for infection rates for each state which does not allow one to get an in-depth view at trends and patterns that might be more obvious in larger sets. Another issue is infection rates and their changes will be different in areas with different population densities which can skew the data in one way or another. Yet another possibility was the fact that the time intervals in this dataset are too small to show any substantial relationships. Nonetheless, we are able to see that stay at home orders are a good way for states to curb the spread of COVID-19 among their citizens.